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Abstract — We  propose  a  parametric  class  of  myopic  scheduling 
and  routing  policies  for  open  and  closed  multiclass  queueing  net¬ 
works.  In  open  networks,  they  steer  the  state  of  the  system  to¬ 
ward  a  predetermined  and  fixed  target,  while,  in  closed  networks 
they  steer  instantaneous  throughputs  toward  a  fixed  target.  In  both 
cases,  the  proposed  policies  measure  distance  from  the  target  using 
a  weighted  norm.  In  open  networks,  we  establish  that  for  an  L2 
norm  the  corresponding  policies  are  stable.  In  closed  networks, 
we  establish  that  with  proper  target  selection  the  corresponding 
policy  is  efficient,  that  is,  attains  bottleneck  throughput  in  the  in¬ 
finite  population  limit.  In  both  open  and  closed  networks,  the  pro¬ 
posed  policies  are  amenable  to  distributed  implementation  using 
local  state  information.  We  exploit  the  work  in  a  previous  paper 
to  select  appropriate  parameter  values  and  outline  how  optimal 
parameter  values  can  be  computed.  We  report  numerical  results 
indicating  that  we  obtain  near-optimal  policies  (when  the  optimal 
can  be  computed)  and  significantly  outperform  heuristic  alterna¬ 
tives.  This  work  has  applications  in  a  number  of  areas  including 
optimizing  the  processing  of  information  in  sensor  networks. 

Index  Terms — Fluid  models,  multiclass  queueing  networks, 
routing,  scheduling,  sensor  networks. 


I.  INTRODUCTION 

WE  CONSIDER  the  problems  of  scheduling  and  routing 
in  open  and  closed  Markovian  multiclass  queueing 
networks  (MQNETs).  Such  networks  process  jobs  that  belong 
to  multiple  types,  differing  in  their  arrival  processes,  routes 
through  the  network,  processing  times,  and  cost  per  unit  of 
waiting  time.  Scheduling  or  sequencing  decisions  determine 
which  job  is  being  processed  at  each  point  in  time  in  the  various 
network  nodes.  Routing  decisions  determine  the  sequence 
of  nodes  at  which  a  job  undergoes  processing  as  it  traverses 
the  network.  In  open  networks,  the  objective  is  to  minimize  a 
weighted  sum  of  mean  waiting  times,  while,  in  closed  networks, 
we  seek  to  maximize  a  weighted  sum  of  mean  throughputs  over 
the  various  job  types. 

Such  problems  have  applications  in  a  number  of  domains, 
including,  manufacturing  systems,  multiprocessor  computer 
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systems,  communication  networks,  and  clusters  of  computing 
servers.  One  particular  application  area  of  interest  concerns 
sensor  networks.  These  networks  consist  of  “sensing  nodes” 
and  “processing  nodes.”  The  term  “processing  nodes”  is  meant 
to  describe  all  nodes  that  collect  and  process  information  gath¬ 
ered  by  the  sensors.  Such  processing  can  range  from  simple 
storage  to  full  processing  of  data  in  order  to  make  decisions 
(e.g.,  as  a  controller  in  a  networked  control  system).  Often 
times  information  collected  by  sensors  needs  to  undergo  sev¬ 
eral  stages  of  processing  at  several  processing  nodes.  Sensor 
networks  typically  operate  in  adverse  environments  using 
battery-powered  sensors  with  limited  local  processing  capabil¬ 
ities.  It  follows  that  the  “response  time”  of  processing  nodes 
needs  to  be  highly  optimized  to  avoid  loosing  information 
from  sensor  nodes  that  are  nearing  the  end  of  their  lifetime, 
or  even  avoid  delayed  action  based  on  critical  and  time- sen¬ 
sitive  information.  We  can  model  the  collection  of  processing 
nodes  as  an  MQNET.  Control  actions  that  affect  performance 
include  routing  and  scheduling.  Routing  includes  both  routing 
of  messages  from  sensor  nodes  to  processing  nodes,  as  well 
as,  routing  of  messages  between  processing  nodes.  Scheduling, 
also,  can  be  done  at  both  the  processing  node  level,  among  jobs 
that  wait  to  be  processed,  and  within  a  processing  node  among 
jobs  that  wait  to  access  the  various  node  resources  (e.g.,  CPU, 
disk,  network  interface  card,  etc.). 

Performance  analysis  in  MQNETs  is  notoriously  hard.  Nat¬ 
urally,  optimizing  an  MQNET  is  even  harder.  A  version  of  the 
scheduling  problem  we  consider  was  shown  to  be  EXP-com- 
plete  [2].  Under  Markovian  assumptions  the  problem  can  be  for¬ 
mulated  as  a  stochastic  dynamic  programming  (DP)  problem. 
This  is  not  very  useful  in  practice  for  two  reasons:  1)  Bellman’s 
“curse  of  dimensionality”  prohibits  us  from  computing  the  op¬ 
timal  policy  in  large  instances,  and  2)  implementing  the  optimal 
policy  is  rather  challenging,  since  typically,  nodes  need  to  have 
global  state  information. 

There  is,  by  now,  a  fair  amount  of  work  in  optimizing 
MQNETs.  A  part  of  the  literature  has  focused  on  heavy-traffic, 
Brownian,  approximations  to  derive  policies  in  special  cases 
[3],  [4].  [1]  and  [5]  provide  a  polyhedral  relaxation  of  the  re¬ 
gion  of  achievable  performance  and  obtain  bounds  on  optimal 
performance.  This  relaxation  is  shown  to  be  exact  in  Klimov’s 
model  [6].  Stability  is  an  important  and  more  basic  question 
than  optimization.  It  should  be  noted  that  in  open  MQNETs 
the  usual  condition  of  node  utilizations  being  less  than  one 
is  not  sufficient  for  the  stability  of  all  policies.  [7]  proves  a 
seminal  result  establishing  that  the  stability  of  a  fluid  model 
is  a  sufficient  condition  for  the  stability  of  the  stochastic  open 
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MQNET.  Several  scheduling  policies  have  been  proposed  for 
open  MQNETs,  including,  fluctuation  smoothing  policies  [8], 
affine  shifts  of  policies  for  the  fluid  model  [9],  tracking  of 
heavy -traffic -based  policies  [10],  and  tracking  optimal  tra¬ 
jectories  of  the  fluid  model  [11].  The  latter  policies  perform 
well  far  from  equilibrium  but  not  necessarily  equally  well  in 
steady  state.  Other  approaches  using  diffusion  models  have 
been  proposed  in  [12],  [13]. 

The  work  in  [7]  has  been  extended  in  closed  networks  with  a 
single  job  type  [14] .  Closed  networks  are  always  stable  since  the 
total  number  of  jobs  in  the  network  is  constant.  The  notion  of 
efficiency  of  a  scheduling  policy  has  been  introduced  in  [14]  and 
can  be  seen  as  analogous  to  stability  in  open  networks.  Specifi¬ 
cally,  a  scheduling  policy  is  called  efficient  if  it  attains  the  max¬ 
imum  throughput  of  the  bottleneck  node  as  the  population  of  the 
network  grows  large. 

Our  work  is  also  related  to  problems  in  manufacturing  sys¬ 
tems  (see  [1 5]— [1 9]).  In  this  literature,  optimal  stochastic  sched¬ 
uling  policies  have  been  shown  to  yield  controlled  dynamics  that 
follow  piecewise  linear  trajectories  characterized  by  attractors 
of  monotonic  ally  decreasing  dimension  leading  to  a  point  at¬ 
tractor  (termed  hedging  point). 

Inspired  by  the  body  of  work  we  outlined,  in  this  paper  we  in¬ 
troduce  a  class  of  scheduling  and  routing  policies  for  MQNETs 
which  we  call  target-pursuing  (TP)  policies.  In  both  open  and 
closed  network  they  “steer”  appropriate  state  variables  toward 
a  predetermined  and  fixed  “target.”  In  open  networks,  this  is 
done  for  the  vector  of  jobs  present  from  all  classes,  while  in 
closed  networks  for  the  instantaneous  throughput  rates  of  all 
job  classes.  These  policies  were  especially  motivated  by  the 
observations:  1)  state  feedback  tracking  policies  in  control  are 
often  effective,  and  2)  the  polyhedral  relaxations  of  the  region  of 
achievable  performance  in  [1]  are  often  tight,  thus,  a  policy  that 
seeks  to  maintain  the  state  of  the  system  in  the  neighborhood  of 
optimal  points  in  these  polyhedra  can  be  rather  effective.  Our 
main  findings  are  as  follows. 

1)  In  open  networks,  we  show  that  TP  policies  are  stable 
and  in  closed  networks  we  establish  that  TP  policies  are 
efficient.  To  that  end,  we  work  with  a  fluid  model. 

2)  We  demonstrate  that  TP  policies  are  amenable  to  dis¬ 
tributed  implementation  without  the  need  to  maintain 
global  state  information.  This  is  key  in  making  these 
policies  attractive  to  implement. 

3)  We  discuss  ways  of  tuning  policy  parameters,  notably  the 
targets,  in  order  to  select  the  best  policy  within  the  class. 

4)  We  provide  a  set  of  illustrative  numerical  results  sug¬ 
gesting  that  TP  policies  perform  close  to  optimal  (when  it 
can  be  computed)  and  outperform  heuristic  alternatives. 

The  remainder  of  this  paper  is  organized  as  follows.  Section  II 
presents  our  model  of  open  MQNETs  where  only  scheduling 
is  subject  to  optimization.  Section  III  introduces  TP  policies 
for  open  networks.  Section  IV  discusses  implementation  issues. 
Section  V  establishes  stability.  Section  VI  outlines  how  to  tune 
policy  parameters.  Section  VII  considers  open  MQNETs  where 
routing  is  also  subject  to  optimization.  Section  VQI  focuses  on 
closed  MQNETs.  Section  IX  contains  our  numerical  results. 
Concluding  remarks  are  in  Section  X. 


Notational  Conventions:  Throughout  this  paper  all  vectors 
are  assumed  to  be  column  vectors.  We  use  lower  case  boldface 
letters  to  denote  vectors  and  for  economy  of  space  we  write 
x  =  ....  Xi  f)  for  the  column  vector  x.  Matrices  are  denoted 

by  boldface  upper  case  letters  and  prime  denotes  transpose.  We 
use  c  to  denote  the  vector  of  all  ones,  (J  for  the  vector  of  all 
zeroes,  e.,  for  the  *thunit  vector,  and  I  for  the  identity  matrix.  For 
any  event  4,  A  denotes  its  complement  and  1  { .4  }  its  indicator 
function.  For  any  x  F  [R^,  we  denote  \x\  =  XaIi  X:i-  ^e  al®° 
use  the  weighted  L2  norm 

R 

=  yV;Fi)2-  (!) 

;=  i 

When  we  write  ||x||2  it  is  assumed  that  (i  =  c. 

II.  Model  and  Key  Quantities 

In  this  section,  we  present  the  model  of  our  open  MQNET 
Initially,  we  consider  only  sequencing  decisions. 

Consider  a  network  consisting  of  ;Y  single-server  nodes.  Jobs 
entering  the  network  are  being  processed  at  a  series  of  nodes  be¬ 
fore,  eventually,  leaving  the  system.  Externally  arriving  jobs  can 
be  of  multiple  types  differing  in  their  arrival  processes,  routes 
through  the  network,  processing  requirements,  and  costs  per 
unit  of  waiting  time.  To  account  for  jobs  processed  at  different 
nodes  we  define  the  class  of  a  job  as  the  pair  of  job  type  and  node 
at  which  it  is  receiving  service.  For  example,  for  a  network  with 
K  job  types  there  can  be  up  to  K  x  N  classes.  Let  R  be  the  total 
number  of  classes. 

We  let  a(r)  denote  the  node  at  which  class  r  is  served  and 
Ci  =  (r|^(f‘)  =  i)  the  constituency  list  of  node  *,  that  is,  the 
set  of  classes  served  at  node  i.  Routing  is  probabilistic,  namely, 
when  a  class  r  job  finishes  processing  at  node  fj(r)  it  is  routed  to 
node  o{rf)  and  becomes  a  job  of  class  rr  with  probability  prr>, 
or  leaves  the  network  with  probability  pr 0  =  1  —  5^=1  . 

Notice  that  we  adopt  the  notational  convention  of  identifying 
the  external  (to  the  network)  world  as  class  zero.  We  denote  by 
P  =  [prr.  the  routing  matrix,  which,  since  the  network 

is  open,  is  assumed  to  be  sub  stochastic,  or  equivalently  the  ma¬ 
trix  (I  —  P'} is  invertible.  External  arrivals  come  according  to  R 
independent  Poisson  arrival  processes,  one  for  each  class,  with 
rate  Ao,  for  class  r.  Finally,  service  times  are  independent  of 
anything  else  in  the  network  and  exponentially  distributed  with 
parameter  pr  for  class  r . 

Let  n(t)  =  >  >  > ,  7Ar(:Q;)  denote  the  vector  of  the 

number  of  jobs  present  in  the  network  from  each  class  at  time 
/.  Under  the  Markovian  assumptions  we  have  imposed,  and 
under  a  Markovian  policy  (i.e.,  a  policy  whose  actions  at  time 
/  depend  on  n (/.)  only),  the  network  evolves  according  to  a 
continuous -time  Markov  chain  with  state  Letting  \r 

denote  the  total  (external  and  internal)  mean  arrival  rate  of  class 
r  jobs,  the  following  traffic  equations  are  satisfied: 

it. 

K  =  V  +  LI  "Pr'rK^  T  Ii .  (2) 

T#  =  l 

In  matrix  notation  this  system  of  equations  can  be  written  as  A  = 
An  +  P'A,  where  A  =  &  . . . . ,  and  A(]  =  (A0| , . . .  *  A0w), 


PASCHAUDIS  it  at  TARGET-PURSUING  SCHEDULING  AND  ROUTING  POLICIES 


1711 


Since  the  network  is  open,  (2)  has  a  unique  solution  given  by 

A  =  (I  —  P')“'Aq.  Let  pr  =  Xr/pr  the  fraction  of  time 
server  rr(r)  works  on  class  t\  The  utilization  of  server  i  is  — 
Pr*  We  assume  pl  <  l  for  all  nodes  i ;  otherwise,  the 
network  is  unstable  in  the  sense  that  |n(/)|  — ►  x  w.p.  1  (with 
probability  one)  as  /  — *  oo- 

We  are  interested  in  a  scheduling  policy  minimizing 

ft 

Lim  ArE[n*-(£)]  (3) 

r=  1 

where  h  =  {ftj,, . , .  Jin)  are  given  weights.  Equivalently,  we 
seek  to  minimize  a  weighted  sum  of  the  mean  queue  lengths 
where  the  expectation  is  taken  with  respect  to  the  steady-state 
distribution.  Using  Little's  law  this  cost  function  can  be  easily 
transformed  into  a  weighted  sum  of  the  mean  waiting  times. 

III.  Target-Pursuing  Policies 

Next,  we  introduce  the  family  of  policies  of  interest.  As 
mentioned  earlier,  [  1 1  and  [5]  provided  a  characterization  of 
the  achievable  region  for  the  performance  vector  lim  E[n{/)] 

t — >ac- 

under  all  Markovian,  preemptive,  and  stable  policies.  By  al¬ 
lowing  randomized  (non-Markovian)  policies,  the  resulting 
achievable  region  can  be  seen  to  be  convex.  We  denote  by  *4 
this  convex  achievable  region;  every  point  in  A  is  achievable 
by  randomizing  among  Markovian  policies J  (We  note  that 
relaxing  achievable  state  variables,  but  in  a  fluid  model  setting, 
has  also  been  exploited  in  1 13].) 

More  specifically,  [  I  f  derives  a  polyhedron,  say  V,  that  con¬ 
tains  the  achievable  region  for  lim  E[n{f)]  (see  Fig.  1).  Opti- 

t—*OQ 

mizing  lim  h'E[u(/)]  over  P  yields  an  optimal  solution,  say 

£  *  00  ^ 

w*,  whose  cost  is  a  lower  bound  on  the  optimal  performance. 
Although  the  polyhedron  P  has  an  exponential  number  of  con¬ 
straints  in  Rf  optimizing  lim  h'E[n(t)]  can  be  done  in  polyrao- 
mial  time  by  solving  a  linear  programming  ( LP }  problem  in  an 
associated  higher  dimensional  polyhedron  with  a  polynomial  in 
R  number  of  variables  and  constraints  [1].  The  bound  is  often 
quite  tight,  but  in  general  w*  £  A  and  cannot  be  achieved  by 
any  policy.  An  interesting  question  is  whether  w*  “contains  in¬ 
formation"  leading  to  a  “good"  policy. 

Motivated  by  the  fact  that  w*  can  be  computed  efficiently 
(in  polynomial  lime)  and  that  it  is  often  “close"  to  the  optimal 
z*  =  argniin^^  h'z,  we  consider  a  myopic  state  feedback 
policy  that  aims  at  “steering"  the  state  of  the  system  toward  w*. 
Such  a  policy  belongs  to  the  following  class  of  policies. 

Definition  I :  We  define  as  TP  the  class  of  scheduling  policies 
which  at  each  time  and  for  a  finite  time  interval  At,  minimize 

E  [||n(i  4  At)  —  0||  |n(t)] 

for  some  nonn  |J  *  ||. 

The  review  interval  At  can  be  selected  to  be  smaller  than  the 
timescale  of  arrivals  and  services.  The  expectation  in  the  pre¬ 
vious  definition  is  w  ith  respect  to  the  probability  distribution  of 
n(/  +  At)  conditional  on  the  state  being  n(t)  at  time  /.  Further¬ 
more,  the  minimization  is  overall  scheduling  decisions  made 

aHere  randomization  among  two  policies .  \  and  B  means  “time-sharing,"  Le., 
take  a  large  lime  interval  sind  follow  for  a  fraction  of  that  interval  and  B  for 
the  remaining  fraction.  The  resulting  policy  is  non -Markovian. 


Fig.  I .  Aehievahle  region  A  included  in  a  polyhedron  F. 

at  time  /,  j.e.,  wc  seek  to  select  which  job  class  is  processed  at 
every  node  at  time  L  Note  that  the  selection  of  the  nonn  and 
of  6  are  left  open.  As  defined,  TP  policies  are  not  necessarily 
work-conserving  (Le,,  servers  can  idle  even  if  there  is  work  to  be 
done).  Henceforth,  we  refer  to  their  work-conserving  versions 
as  work-conserving  TP  policies. 

In  the  sequel,  we  will  consider  the  nonn  of  (1)  and  explore 
several  ways  of  selecting  an  appropriate  “target”  0  As  we 
have  indicated  above,  one  potential  target  is  w*.  We  will  see 
that  setting  ff  =  w*  often  leads  to  a  good  policy.  We  note  that 
TP  policies  are  myopic  and  greedy,  thus,  we  do  not  claim 
any  optimality  properties.  We  only  establish  stability  results 
and  provide  some  analytical  and  numerical  evidence  that  they 
perform  quite  well. 

IV  Implementation  Issues 

In  this  section,  we  discuss  howr  to  best  implement  TP  poli¬ 
cies.  We  will  see  that  the  implementation  complexity  amounts 
to  solving  an  LP  problem  at  each  decision  epoch.  However,  the 
computations  can  be  decomposed  across  nodes  and  nodes  re¬ 
quire  only  (limited)  local  stale  information  to  perform  them. 

Consider  the  network  of  Section  II  and  let  us  uniformize  the 
corresponding  continuous-time  Markov  chain.  In  particular,  set 
v  —  Xqj.  +  ,  p.r  and  consider  the  uniformized  chain 

with  uniform  transition  rate  i /,  Let  { r* }  be  the  sequence  of 
epochs  at  which  the  uniformized  chain  makes  transitions;  this  is 
also  the  sequence  of  ticks  from  a  “Poisson  dock11  with  rate  v.  As 
n (t)  is  right-continuous,  refers  to  the  state  right  after  the 
ktb  transition.  Select  At  <C  minr  min{  1. 1/Ayr,  1  fpr  },  i.e..  At 
is  small  enough  and  in  a  much  faster  timescale  than  arrivals  and 
services.  In  (r^,  t*  +  At ]  we  have:  an  external  class  r  arrival  with 
probability  Af>r  Af,  a  class  r  service  completion  with  probability 
p,  At  if  node  <r(  r)  is  working  on  class  r  at  a  self-transition 
with  probability  ptAt  if  node  &(r)  is  not  working  on  class  r  at 
r*,  no  transition  with  probability  i  —  vAt*  and  more  than  one 
transitions  with  probability  o(At). 

In  the  uniformized  Markov  chain,  scheduling  decisions  need 
only  to  be  made  at  the  epochs  Let  Z?r(rfe)  the  event  that 
node  tf(r)  is  working  on  class  r  right  after  and  Z?f.(7>)  its 
complement  For  any  6  E  R,  At  as  defined  before,  and  using 
nonn  (1),  the  TP  policy  minimizes 

R 

=  A0r||n(rfe)+eI.-tf||p 

)’=1 
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It 

52  Prr'  llI1(T>)-er  +  «V  -  #  I \fl 

y = i 

+  Pro||n(rj!)-er-0||^j 

+  ^/ir1{Br(rfc)}||n(Tfe)-tf||^  +  ^^.  (4) 

F—  1 

Set  x(rf.)  -  (1{Si{t>.)}, - l{Sfl(rlfc)}).  Discarding  con- 

slants,  notice  that  the  right-hand  side  (RHS)  of  (4)  can  be  written 
as  x(Tfc)'q{n(rfc):fl./3)  +  o(At)/M  where  q(n(rk)J^)  € 
is  appropriately  defined.  Since  for  small  enough  At  the  first 
term  dominates,  to  implement  the  TP  policy  with  norm  ||  ■  \\% 
we  will  be  solving  at  each  epoch  rk  the  following  LP  problem: 

(LP1)  min  x(rfc)/q(n(rJt),9,/3) 

s*t.  ^  <  lj  i  = 

r<=Ci 

0  <  x(rfc)  <  n(rjfr)  (5) 

where  x(r^.)  is  the  decision  vector.2  The  first  inequality  con¬ 
straint  above  bounds  the  utilization  of  each  server  by  one  and 
the  constraint  x(t>)  <  11(7* )  ensures  that  no  capacity  is  allo¬ 
cated  to  empty  classes.  If  we  impose  work  conservation,  the  first 
inequality  constraint  becomes  an  equality  at  all  nodes  i  with  jobs 
present.  Under  both  work-conserving  and  non  work-conserving 
TP  policies,  the  constraint  matrix  is  totally  uni  modular,  hence, 
the  feasible  set  is  a  poly  tope  with  integer  extreme  points  and 
(LP1)  yields  an  integer  optimal  solution. 

A  couple  of  remarks  on  the  implementation  complexity  are  in 
order.  The  size  of  (LP1)  is  O(RN)  which  is  polynomial  in  the 
size  of  the  MQNET.  Very  large  instances  of  LP  problems  can 
be  solved  efficiently  (in  polynomial  time)  using  interior-point 
algorithms.  For  large  networks,  though,  the  computational  re¬ 
quirements  for  solving  (LP1)  can  be  substantial.  Moreover, 
the  formulation  in  (5)  requires  a  centralized  computation  with 
global  state  information.  Fortunately,  the  work  can  be  decentral¬ 
ized  and  distributed  across  various  nodes.  To  see  that,  and  for 
simplicity  of  the  exposition,  let  0  —  e.  Decomposing  (LPI) 
across  nodes,  node  i  has  to  solve 


+  ^2fir-l{Br(Tk)} 

F—  1 


(Node  —  LP1) 

min  52  HrXr(Tk){2  -  Pro ) 
r€Ci 

-  2  flrXr{Tk)PrO  (^r(^)  ~  Or) 
reCi 

R 

2  ^  ^  flrXT  (t^  )  ^  ^  Prr* 

r£Ci  r'=l 

X  ‘  0rf  ~  ^r(Tfe)  +  $r] 

s.t.  52  xr(n)  <  1 

j-eCi 

0  <  arr(rfc)  <  nr{rk)  r  e  CH 

where  xr(Tfc),  r  F  Ct ,  are  the  decision  variables.  Typically  the 
number  of  classes  served  at  an  arbitrary  node  i  is  much  less 
than  R .  Moreover,  to  solve  (Node  —  LPI),  node  i  needs  state 

2  Equivalently,  and  to  avoid  inaccuracies  due  to  At,  we  can  define  the  TP 
policy  as  the  policy  obtained  through  (LPI ). 


information  for  classes  r  6  C*5  and  all  classes  within  one  hop, 
i.e.,  rf  with  prrr  >  0  for  all  r  6  C\ .  The  number  of  such  rf  would 
also  be  much  less  than  R  in  most  cases.  Thus,  (Node  —  LPI ) 
can  be  solved  by  each  node  using  local  information  much  faster 
than  solving  (LPI). 

V.  Stability  Analysis 

In  this  section,  we  show  that  TP  policies  are  stable.  To  that 
end,  and  following  [7]  and  [20],  we  consider  a  fluid  model,  es¬ 
tablish  stability  of  the  fluid  model,  and  then  infer  the  stability  of 
the  stochastic  system. 

A.  Fluid  Model 

To  avoid  overburdening  our  notation  we  use  n(t)  to  denote 
the  queue  length  vector  in  the  fluid  model  as  well;  it  will  be  ev¬ 
ident  from  the  context  whether  we  refer  to  the  fluid  model  or 
the  stochastic  system.  Let  Tr(t)  the  cumulative  amount  of  time 
server  a(r)  has  spent  working  on  class  r  in  [0,  t].  Let  M  — 
and  u(f)  =  ...  where 

diag(x  1, , .  + ,  £r)  denotes  the  diagonal  matrix  with  main  diag¬ 
onal  Xi,  t , . ,  xr  and  zeroes  elsewhere.  Let  also  C  =  (c;r)  be 
the  constituency  matrix  of  the  network  with  ar  =  l{a(r)  =  i}? 
for  all  r  =  1 , , ,  t ,  R  and  i  =  1, , , , ,  N.  In  the  fluid  model,  for 
all  t  >  0  we  have 

riU)>A0-(I-P')M  u  (t) 

C  u(t)  <  e 

n{t),u(t)>0.  (6) 

Here,  ur(t)  —  Tr(t)  which  can  be  interpreted  as  the  fraction  of 
the  capacity  of  server  a(r)  allocated  to  class  r  at  time  t.  The 
functions  nr(t)  and  Tr(t)  are  absolutely  continuous,  and  thus, 
differentiable  almost  everywhere.  The  equations  in  (6)  hold  for 
all  times  t  at  which  nr(t)  and  Tr(t)  are  differentiable;  these 
points  in  time  will  be  referred  to  as  regular.  We  next  derive  the 
fluid  version  of  the  TP  policy  under  the  norm  in  (1). 

Note  that  at  every  decision  epoch  IP  policies  minimize  the 
expression  given  in  Definition  I  subject  to  the  constraints  of 
(LPI)  [cf.  (5)].  Let  us  first  consider  the  objective  function  in 
this  minimization.  Let  |n(0)|  =  k  >  0  and  consider  the  fluid 
scaling  of  the  stochastic  system 

nk(t)  ~  jnk (kt) 

ft 

where  nfc(-)  denotes  the  queue  length  vector  in  the  stochastic 
system  initialized  with  |n(())|  =  k.  Since  we  will  deal  with 
the  limit  k  — +  oc  in  the  space  of  sample  paths  of  nk{t ),  let 
us  explicitly  write  n*(f,w)  for  a  particular  sample  path  -uj  of 
n k(t).  We  restrict  ourselves  only  to  w  satisfying  the  strong  law 
of  large  numbers  ( SLLN)  for  the  arrival,  service,  and  routing 
processes.  [20]  proves  that  if  u)\fk  is  bounded  as  k 

00,  then  nfc(-,  w)  is  precompact  as  fc  — *  oc  in  the  Skorohod  path 
space  D  R  [0,  oc  )  endowed  with  the  u.o.c.  (uniformly  on  compact 
sets)  topology.  This  implies  that  rifc( >,uj)  is  tight  as  k  — *  00 
([21]— [23]).  Thus,  for  each  sequence  k  — ►  oc  there  exists  a 
subsequence  ks  — ►  00  along  which 

nfc’(.,w)  -> n(-), 


u.ox. 
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for  some  process  n  6  D^[0,  oo)  which  is  called  fluid  limit  and 
satisfies  (6).  Al  lime  kt  the  TP  policy  minimizes 

E  [||n(fct  +  At)  —  0"\  —  ||n(ftt)  —  ff||  |n(H)] . 

Scaling  by  1/ft,  this  is  equivalent  to  minimizing 


n(fc(*  +  T)) 


8 

k 


n  (kt) 

~T~ 


Taking  ft  — *  oo,  and  since  the  stochastic  system  converges  to 
the  fluid  limit  for  all  w  considered  before,  we  conclude  that  for 
all  6  the  fluid  version  of  the  TP  policy  seeks  to  minimize 


d_ 

dt 


fi#)H 


(7) 


at  regular  t.  Using  the  norm  in  (1)  and  taking  the  fluid  limit  of 
the  policy  prescribed  by  (LP1)  the  corresponding  fluid  version 
minimizes  d||ii(f)||^/df. 

Consider  next  the  constraints  of  (LP1)  under  which  the  min¬ 
imization  of  (7)  is  done.  The  constraints  of  (LP1)  under  the 
fluid  scaling  translate  to  the  constraints  in  (6)  with  the  additional 
constraint  that  for  all  classes  r  it  holds  ur(t)  —  0  whenever 
nr(t)  —  0.  Therefore,  the  fluid  version  of  the  TP  policy  mini¬ 
mizes  the  expression  in  (7)  subject  to  the  fluid  feasibility  con¬ 
straints  of  (6)  and  the  additional  “idle  when  empty”  constraint 
indicated  above.  This  policy  is  well  defined  for  all  regular  t  and 
we  will  refer  to  it  as  the  fluid  target-pursuing  ( FTP }  policy.  FTP 
aims  at  maximizing  the  negative  drift  and  driving  the  fluid  level 
toward  zero.  It  can  be  shown  that  the  amounts  of  time  allocated 
to  various  classes  in  the  stochastic  system  under  the  TP  policy 
converge  to  corresponding  quantities  in  the  fluid  model  under 
the  FTP. 


B.  Stability  of  the  Fluid  Model 

We  next  establish  the  stability  of  the  fluid  model  operating 
under  the  FTP  policy,  that  is,  the  nonwork-conserving  policy 
minimizing  d\\n(t)\\pfdt  for  each  t. 

Proposition  VI:  Consider  the  fluid  model  operating  under 
the  nonwork-conserving  FTP  policy  which  uses  the  weighted 
L2  norm  ||n(f)||l,  where  >  0.  For  every  solution  of  the  fluid 
equations  (6)  satisfying  |n(0)|  <  1  and  ur(i)  —  0  whenever 
nr(i)  —  0  for  all  r,  there  exists  some  6(q)  >  0  such  that  for  all 
0  <  r]  <  I  and  all  t  >  6  it  follows  \n(t)\  <  ip 

Proof:  Fix  q  €  (0, 1).  Let  B  =  diag(/?i, . . . ,  and 

G(t)^||n<t)||2  =n'(t)Bn(f). 

Clearly,  G(i)  =  0  if  and  only  if  n{f)  =  0,  and  G(n(f))  can 
be  shown  to  be  locally  Lipschitz  continuous  in  n(t),  i,e.,  for 
any  compact  set  Oy  there  exists  a  constant  k(O)  such  that  for 
any  ni(f)*na(f)  C  O  it  holds  that  |G(ni(f))  —  G(n2(t))|  < 
7;(0)|rii(f)  —  n2(0|-  We  show  next  that  G(t)  is  nonincreasing 
in  t.  Using  the  fluid  model  dynamics  of  (6)  we  obtain 

G(t)  =  2n'(f)B  [Ac  +  P'Mu(t)  -  Mu (t)] .  (8) 

Let  us  now  employ  the  nonwork-conserving  policy  that  assigns 
a  fraction  ur(t)  =  pr  of  node’s  tr(r)  capacity  to  all  nonempty 
classes  r  and  zero  capacity  to  all  empty  classes.  Let  n(t)  be 


the  vector  induced  by  this  policy.  Let  also  h  =  (pi , , , , ,  pR). 
Note  that  for  all  l  PyMu(t)  <  P'Mu,  and  n^JBMufi)  — 
n'(/)BMu.  As  a  result,  (8)  implies 

G(t)  <  2iT(t)B  [An  -  (I  -  P>lu]  =  0  (9) 


where  in  the  last  equation  we  used  the  traffic  equations  (cf. 
(2)).  Since  the  FTP  policy  described  in  the  statement  of  the 
Proposition  minimizes  G(t)  for  all  t ,  we  will  have  G(t )  <  0 
for  this  latter  policy.  Thus,  G(i)  is  nonincreasing  in  time  and 

G(t)  <  G( 0)  for  all  t  >  0. 

Let  &.(t)  =  / %nr{t }  -  J2?'=lPrr'Pr>nri(t),  £(*)  = 

(£i(£), . .  * ,  £r(£)),  and  note  that£{t)  =  (I  —  P)Bn(f).  Due  to 
the  monotonicity  of  G(t)f  for  all  t  >  0  and  r  we  have 

n 

Anin  («r(*))2  <  A'  (™T'(*))J  <  ^  A  (™r(*))2  <  G(0) 

r—  1 

where  Am,,  =  minr  A-  Thus,  for  all 

t  >  0  and  r.  This  implies  that  for  all  t  >  0  and  r 


R 

|fr(*)l  <0rnr(t)  +  E  prr'fir’nr>(t) 


<A1^/(®)(2-Ph))  (10) 

where  /imax  =  maxr  j3r .  Consider  the  nonwork-conserving 
policy  which  allocates  to  class  r 


Ur 


0, 


&v 


if  nr(t)  >  0 

otherwise. 


(ID 


Let  e  >  0  be  such  that  at  all  t  >  0  we  have  pj  <  1  for  all  nodes 
j  and  XT  +  (t)  >  0  for  all  r.  Such  an  e  >  0  exists  due  to  ( 10). 

Using  this  policy,  from  (8),  we  obtain 


G(t)  <  2  E  /W0 

r|7i^(t)>0 


A  Or  Ar  £^r{0  A  ^  Pr  *  r  (Ar'  Tt£r'(0) 


-le'Y  fr-nM) 


^  *  Pr'r^r1  (f)  £r(f) 


=  -  2en'(f)B(I  -  P')(I  -  P)Bn(f). 


In  the  first  inequality  shown  above,  we  used  the  fact 
Ar  +c£r  (t)  >  0  for  all  r  and  i  >  0.  In  the  first  equality  above  we 
used  the  traffic  equations  in  (2).  Let  D  =  B(I  —  P')(I  -  P)B 
and  note  that  D  is  symmetric.  Since  we  are  dealing  with  an 
open  queueing  network,  (I  —  P'  )  is  invertible,  thus,  B(I  —  P') 
is  also  invertible.  Hence,  D  is  positive  definite  and  has  real 
and  strictly  positive  eigenvalues.  Letting  smiri  be  the  smallest 
eigenvalue  of  D  we  obtain 

G(t)  <  -2en'(()Dn(t)  <  -2snijne  ||n(f)||2 .  (12) 

Whenever  |n{t)|  >  77,  we  have 

G(f)  >  Anin  E  (^(*))2  >  Aning,yyj)  >  Anin  ^  ■ 
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This  implies 

!|n(0l|2  >  n'(f}Bn(t)  > 

which  implies  in  turn 

hoi’  >  ir~  =  <i3> 

P max  ** 

Consequently,  in  the  fluid  model  under  policy  (11),  whenever 
G(t)  >  0m (12)  and  (13)  yield 

G(t)  <  -2*^4*.  (14) 

Since  the  FTP  policy  described  in  the  statement  of  the  Propo¬ 
sition  minimizes  G(f)  for  all  t7  G(t)  will  be  upper-bounded  by 
^2smine0*  under  the  latter  policy  as  well. 

Suppose  now  (7(0)  >  0mi nT]2/  R.  Equation  (14)  implies  that 
G(l)  will  reach  the  region  G(t)  <  flmmO 2/R  within  time 
where 

t  <  <j(0)  -  ffmin  jf 
V  -  2#mi 

Furthermore,  G(t)  will  remain  in  this  region  for  all  t  >  tTf 
since  it  is  a  nonincreasing  function  of  time.  We  conclude  that 
for  all  t  >  trn  it  holds  that  |n(f)|  <  rj,  since  otherwise  G(t)  > 
0min}}2/ R-  Finally,  in  the  case  (7(0)  <  0m-m rf2 /If  the  same  ar¬ 
gument  applies  and  |n(f)|  <  7}  for  all  t  >  0.  ■ 

C.  Stability  of  the  Stochastic  Network 

We  conclude  this  section  by  establishing  that  the  MQNET 
is  stable  under  the  TP  policy  using  the  norm  in  (1).  Note  that 
for  any  target  8  €  IR  R  the  TP  policy  is  Markovian  and  under 
this  policy  the  state  of  the  network  is  the  queue  length  vector 
n(i)  E  1*  that  evolves  as  a  continuous -time  Markov  chain. 
The  next  theorem  establishes  that  this  Markov  chain  is  positive 
Harris  recurrent  (see  [7],  [20],  [21]), 

Theorem  V2:  Consider  the  MQNET  of  Section  II  operated 
under  the  TP  policy  that  uses  the  norm  j|n(t)||£»  where  0  >  0. 
The  Markov  chain  n(£)  is  positive  Harris  recurrent. 

Proof:  We  will  slightly  modify  the  proof  in  [7],  Proposi¬ 
tion  V.  1  establishes  that  there  exists  some  6  >  1  such  that  for  any 
solution  h(t)  of  the  fluid  model  equations  and  any  0  <  r]  <  1 
we  have  \n(t)\  <  ?/  for  all  t  >  S.  Let  {zk }  be  any  sequence  of 
initial  states  n(0)  with  |z*|  -o  oc  a s  k  —>  oc.  From  the  exis¬ 
tence  of  the  fluid  limit  (see  [7])  there  exists  a  subsequence  {  z  [ 
such  that 

(jzfcj  |  <“>)  =  n(A) 

where,  as  in  Section  V-A,  n(-)  denotes  the  fluid  limit.  The  fluid 
limit  satisfies  the  fluid  model  equations,  thus, 

lilts  -r— I  (\zk  I  f)|  <  f}. 

\zkj\ 

Using  the  uniform  inlegrability  (see  [7,  Lemma  4.5])  of  the  se¬ 
quence  on  the  left-hand  side,  we  obtain 


iim 


M 


Since  {zk}  is  an  arbitrary  sequence  we  have 

lim  ^E[|nz(|z|*')|]<r,, 

| z |  — y oo  |Z| 

Let  0  <  e  <  1  —  7].  There  exists  some  k  >  1  such  that 

iyE[|na(W|]<l-e 

for  all  z  with  |z|  >  n.  The  remainder  of  the  proof  follows  ex¬ 
actly  the  proof  of  [7,  Th.  3.1].  ■ 


VI.  Optimizing  Over  Policy  Parameters 

(LF1)  suggests  that  if  class  i  and  j  jobs  are  processed  at 
the  same  node,  0)  =  ^(n(Tjt)i  9,  0)  constitutes  a 

policy  switching  hyperplane.  Namely,  TP  policies  are  character¬ 
ized  by  switching  hyperplanes  detenu ined  by  policy  parameters. 
In  this  section,  we  discuss  how  we  can  optimize  over  these  pa¬ 
rameters,  that  is,  the  target  8  and  the  weight  vector  /?,  in  order  to 
obtain  the  best  policy  within  the  class  [i.e.,  minimizing  (3)],  As 
mentioned  in  Section  III,  the  achievable  region  LP  provides  a 
tentative  value  of  8  equal  to  w*  which,  as  we  will  see,  performs 
quite  well.  Here,  we  are  interested  in  further  improving  the  se¬ 
lection  of  8  and  optimizing  over  0  as  well.  To  that  end,  we  use 
a  simulation-based  method  developed  in  [24].  The  underlying 
idea  is  rather  simple:  During  the  course  of  a  simulation  of  the 
system  we  obtain  “gradient  information”  which  we  then  use  to 
optimize  over  the  parameters. 

/)  Smooth  Target-Pursuing  Policies:  To  fix  notation,  con¬ 
sider  the  uniformized  Markov  chain  of  Section  IV  and  the  TP 
policy  outlined  there  with  weight  vector  0  >  0.  At  each  tran¬ 
sition  epoch  t*.,  scheduling  decisions  are  made  according  to 
the  optima]  solution,  say  x*(n(rfc),  8,0),  of  (LP1).  Note  that 
x*(n(rjt),  8. 0)  is  piecewise  constant  in  (8,0)  with  the  jumps 
occurring  at  the  points  that  the  optimal  solution  switches  from 
one  extreme  point  of  the  feasible  set  to  another.  Consequently, 
using  a  simulation- based  gradient  optimization  method  to  opti¬ 
mize  over  the  parameters  would  not  be  very  successful  since  the 
gradients  would  be  zero  most  of  the  time. 

To  bypass  this  difficulty  we  use  randomization  to  introduce 
a  smoother  version  of  our  TP  policies.  For  simplicity  of  the 
exposition,  we  concentrate  on  work-con  serving  TP  policies; 
the  nonwork-conserving  case  can  be  handled  similarly.  Let 
y^(n (rk)-9.0)  be  a  feasible  solution  of  (LF1)  such  that 
at  time  rk  class  r  is  served  at  node  tf(r)  and  the  remaining 
decisions  at  all  other  nodes  coincide  with  x*(n(rft),  0, 0).  Let 
7  >  0  be  a  scalar  and  set 


aT  (n(Tk),9,0) 


■TyM(nfo),M)Vn(r,0Af») 


.  (15) 


At  time  r*,  we  serve  class  r  at  node  <r(r)  with  probability 


ar  (n(T-fe):0./?) 


cir  (n (Tk),&,0) , 
0, 


if  M7*)  >  0  (16) 

otherwise. 


m  tVe  (]%|«)  |]  <r,. 

J-™  |  | 


Notice  that  as  7  — *  0  all  nonempty  classes  at  a  node  have  equal 
probability  of  being  served,  and  as  7  — +  oc  the  randomized 
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policy  converges  to  the  policy  implied  by  x*(n(rjk),  tf, 0).  The 
expression  in  (15)  can  be  simplified  as 


Mn(r*),0,0) 


Y  g-7 

Tf  r  j  >n r-f 


To  describe  the  implementation  of  TP  policies  in  the  extended 
model  we  uniformize  the  Markov  chain  ri( t )  as  in  Section  IV. 
Let  again  v  denote  the  uniform  transition  rate  and  {rjt}  the  se¬ 
quence  of  transition  epochs  in  the  uniformized  Markov  chain. 
For  any  tf ,  At  small  enough  and  as  specified  in  Section  IV,  and 
using  norm  (1),  the  TP  policy  minimizes 


where  qr{n(rk}.8. 0)  is  the  rth  coordinate  of  q(n(Tfc),  6, 0). 
Henceforth,  we  will  be  referring  to  this  policy  as  the  work-con¬ 
serving  smooth  target-pursuing  (STP)  policy.  This  randomiza¬ 
tion  scheme  is  useful  in  satisfying  the  conditions  required  by  the 
simulation- based  optimization  algorithm  we  implement. 

2)  Simulation-Based  Optimization:  We  adopted  the  STP 
policy  and  used  the  simulation-based  method  of  [24]  to  optimize 
the  objective  of  (3)  over  the  parameters  (8,0).  In  Section  IX, 
we  report  illustrative  numerical  results  and  compare  with  a 
set  of  other  scheduling  policies.  Under  a  set  of  stability  and 
regularity  conditions  and  a  standard  diminishing  step-size  rule, 
the  algorithm  in  [24]  is  shown  to  convergence  w.p.  1  to  a  local 
minimum.  In  our  setting,  the  required  stability  condition  is 
satisfied  due  to  the  result  in  Section  V.  The  remaining  regularity 
conditions,  however,  are  not  satisfied  in  all  cases  of  interest. 

Under  the  STP  policy,  cvr(n(rfc),  8.  0)  satisfies  the  required 
regularity  conditions  with  respect  to  8  but  not  with  respect  to  0. 
As  a  result,  we  fixed  0  and  used  the  method  in  [24]  to  optimize 
over  8.  We  then  employed  random  search  around  0  —  e  to  se¬ 
lect  a  good  0.  Admittedly,  using  a  simulation-based  method  to 
optimize  over  tf  can  be  slow.  We  were  encouraged  to  notice  that 
initializing  the  algorithm  with  a  tentative  value  equal  to  w*,  ob¬ 
tained  from  the  achievable  region  LP,  led  to  considerably  faster 
convergence. 


VII.  Combined  Routing/Scheduling  Decisions 

In  this  section,  we  extend  the  basic  queueing  network  model 
of  Section  0  to  consider  the  case  where  routing  is  not  fixed  but 
also  subject  to  optimization. 

We  adopt  the  model  and  notation  of  Section  II,  indicating 
only  the  differences  with  the  extended  model  we  consider  here. 
As  in  Section  II,  jobs  of  class  r  —  1, . . . ,  R  arrive  to  the  net¬ 
work  according  to  a  Poisson  arrival  process  with  rate  Xqv.  Upon 
arrival,  though,  and  before  joining  the  corresponding  queue,  a 
router  selects  a  particular  class  and  routes  the  arriving  job  to  that 
class.  Let  Arrt  (t)  denote  the  event  that  an  externally  arriving  job 
of  class  r  is  routed  to  class  r*  upon  its  arrival  at  time  t.  Routing 
decisions  are  also  made  at  the  various  nodes  when  jobs  are  ad¬ 
mitted  for  service.  Let  Brr*{t)  denote  the  event  that  at  time  t 
node  a(r)  is  working  on  a  class  r  job  that  will  be  routed  to  class 
jj  upon  completion  of  service. 

In  this  modified  setting,  we  are  interested  in  devising  a  com¬ 
bined  scheduling  and  routing  policy  to  minimize  the  cost  func¬ 
tion  (3).  Target-pursuing  (TP)  policies  are  defined  exactly  as  in 
Section  III  (cf.  Definition  1)  with  the  only  exception  that  the 
minimization  is  with  respect  to  both  scheduling  and  routing  de¬ 
cisions  at  each  time  f.  A  polyhedral  relaxation  P  of  the  achiev¬ 
able  region  A  can  be  obtained  in  this  case  as  well  (see  [1]);  an 
optimal  solution  of  this  achievable  region  LP,  denoted  again  by 
w*,  is  one  particular  choice  for  tf. 


^E[l|n(n  +  At)-0|£  |n(Tfe)] 

R  R 

i 

r  R 

E1^  rr1  (rfc)}ilI1(Tfc)-er+er'- 


r=  l  r!—  1 
R  f  R 

+  ^  ^  Pt 

r- 1  Lr'=l 


R  R 

<l7> 

r=l  r'=0 


Let  now  (rjt )  —  1  { Drri (t*. ) }  for  r  =  l,...-, R  and 

r'  =  0, 1, . . . ,  R,  ijrr'  (jk)  ~  l{,4,.,.<(rt)}  for  r,  r‘  = 
and  denote  x(rjfc)  =  (l{B10(rt)}, . . . ,  1  {Bira  (t*)}), 

y(n-)  =  (l{J4n(n-)},---,l{J4tiii(n-)})- Discarding  con¬ 
stants,  the  RHS  of  (17)  can  be  written  as 

x('nfc)'qt(n(Tt)^?/0)  +y(n)'ci2(n(Tk},0,0)  +  ^p- 

where  q*  (n(r^ ),  tf .  0),  i  —  1, 2,  are  appropriately  defined.  Since 
for  small  enough  At  the  first  two  terms  dominate,  to  implement 
the  TP  policy  with  norm  ||  ■  ||^  we  will  be  solving  the  following 
LP  problem  at  each  epoch  rk : 


(LP2)  mill  x(n:)'qi  (n (rfc),  tf.  0)+y (rfc)'q2  {n(rh).  8. 0) 

R 

-*■  EE  X  rr*  (rk)  <  1,  Vi 
r€Cj  r’= 0 

n 

^2  x-rrf(n)  <  nT.(rk),  Vr 
r'=  0 
R 

^2  yrrf(n)  =  1 ,  Vr 

x(Tfc),y(rjt)  >  0  (18) 


where  (x(rjt),  y(r^))  is  the  decision  vector.  In  the  case  of  a 
work-conserving  TP  policy,  the  first  inequality  constraint  above 
becomes  an  equality,  except  at  nodes  with  no  jobs  present.  It 
should  be  noted  that  situations  where  a  class  can  only  be  routed 
to  a  subset  of  other  class  are  easily  accommodated;  one  needs  to 
simply  add  constraints  of  the  form  xrrr  (rk )  =  0  and  yrrf  (rk)  = 
0  if  r  can  not  be  routed  to  rf.  Again,  as  it  was  the  case  with 
(LP1),  the  work  to  solve  (LP2)  can  be  distributed  across  the 
nodes  of  the  network  with  node  i  deciding  for  awfrt)  and 
Vrr*  (tO  with  r  £  C\.  Furthermore,  each  node  needs  only  local 


1716 


IEEE  TRANSACTIONS  ON  AUTOMATIC  CONTROL,  VOL.  49,  NO.  10,  OCTOBER  2004 


state  information,  i.e.,  state  information  for  all  classes  served  at 
the  node  and  all  classes  the  node  can  route  jobs  to. 

Finally,  the  discussion  of  Section  VI  applies  intact  to  the  ex¬ 
tended  model  considered  here  and  a  simulation-based  method  to 
optimize  over  policy  parameters,  0  and  (3t  is  applicable.  Since 
the  optimal  solution  of  (LP2 )  is  integer,  a  smooth  TP  policy  (as 
in  Section  VI)  must  be  employed  to  that  end. 

A.  Fluid  Model  and  the  Fluid  TP  Policy 

We  now  proceed  to  establish  the  stability  of  TP  policies  in 
the  combined  scheduling/routing  model.  Let  Arrfit),  r,  rf  — 
1, .  *  ♦ ,  R ,  denote  the  number  of  external  class  r  arrivals  routed 
to  class  rf  upon  arrival  in  the  time  interval  [0,  t].  Let  also  Trrfit}f 
r  —  1, , , , ,  R,  iJ  —  0, . , . ,  R,  denote  the  cumulative  amount  of 
time  server  o(r )  has  spent  in  the  time  interval  [0,  f]  working  on 
class  r  jobs  that  are  routed  to  class  rf .  In  the  fluid  model,  for  all 
t  >  0,  the  dynamics  of  the  network  satisfy 

R  R 

»r(0  =  £  Apr' ar< r [t )  +  ^ flr’ Ur'r{t) 

t  f  —  1 

R 

r'=0 

R 

^  Urr>  (t)<  1  i  =  1,  .  .  *  ,  iV 

r£Ci  rf=  0 

Ft 

Y  drrt  (t}  =  1  r  =  1, . . . ,  R 

7-'=l 

drrf  (t)  >  0  r,  r*  —  L  ,  * . ,  R 

urr*  (t)  >  0  r  =  L  >  t  *  5  R,  r=  0.  „ . . ,  R 

(19) 

where  uTr*  (t )  =  Tvr*  (t)  is  the  fraction  of  the  capacity  of  server 
<x(r)  allocated  at  time  t  to  class  r  jobs  that  are  routed  to  class  r\ 
and  dTtr(t)  —  Ar*r(t) / Aqt>  is  the  fraction  of  class  rf  external 
arrivals  routed  to  class  r  upon  their  arrival  at  time  t.  The  equa¬ 
tions  in  (19)  hold  for  all  regular  times  t. 

Following  the  same  reasoning  as  in  Section  V-A,  for  all  t  and 
0  the  fluid  version  of  the  TP  policy  selects  the  variables  cxrr'(f) 
and  urrt(t)  to  minimize 

where  n(i)  is  the  fluid  limit  of  the  stochastic  system  satisfying 
(19).  Regarding  the  constraints  under  which  this  minimization 
is  performed,  the  discussion  of  Section  V-A  applies.  Specifi¬ 
cally,  the  FTP  policy  using  the  L2  norm  of  n(f)  needs  to  satisfy 

urrt(t)  —  0  whenever  nr(t)  —  0  for  all  r,  r\  and  f. 

B.  Stability  Analysis 

The  following  proposition  is  similar  to  Prop.  V.l  and  estab- 
fishes  a  form  of  stability  for  the  fluid  model  using  the  non- 
work-conserving  FTP  policy  under  norm  ||n(f)||^. 

Proposition  VII.  I :  Consider  the  fluid  model  operating  under 
the  nonwork-conserving  FTP  policy  which  uses  norm  ||n(f,)||^, 
where  fi  >  0.  Suppose  there  exists  a  routing  probability  matrix 


P  =  {prr1  and  nonnegative  yTr>,  r,r*  =  1 . R,  such 

that 


R 


Ar  —  y  ^  Aq]-11  yrfT  H"  /  A r*pr*r^  v  —  1, » . .  j  .ft 


r'  =  1 
R 


r'=l 


y  ^  Vtt*  —  I  T  —  1 ,  .  *  *  7  FL 


rf— 1 


yrrf  >  0,  t,t1  —  1,  — ,R 

E-<1.  i-l . W 

- -  i  t 


r€Ci 


and  (I  —  P' )  is  invertible.  Then  for  every  solution  of  the  fluid 
equations  (19)  satisfying  |n(0)|  <  1  and  urrt(t)  =  0  whenever 
nr(t)  =  0  for  all  r,r)  and  /,  there  exists  some  S(r/)  >  0  such 
that  for  all  0  <  r\  <  1  and  all  /  >  $  it  follows  |n(/)|  <  ?). 

Proof:  The  proof  is  similar  to  the  one  of  Prop.  V.L  Fix 
r/  €  (0. 1).  Let  again  B  =  diag(/?i , . . . ,  fin)  and 


G(t)  =  ||n(*)lL  =  n'(*)Bn(f)- 

Using  the  fluid  model  dynamics  of  (19)  we  obtain 


R 

G(t)  =  2^  AM*) 

r=  1 


"  R 

y  ^  Aor^ir^r (£) 

L  rf=l 


R  R 

+  y  Vr‘Ur'r(t)  ~  Br  ^  urr 1  {*)  * 

rf— 1  r'=0 


Let  us  adopt  a  policy  that  decomposes  routing  and  scheduling 
decisions.  More  specifically,  we  employ  a  (fixed)  routing  policy 
that  uses  a  routing  matrix  P  and  nonnegative  yrr*  that  satisfy  the 
set  of  equations  given  in  the  statement  of  the  proposition.  As  in 
the  pure  scheduling  problem  uT  (t)  —  J2rf=Q  urvr(t)  denotes 
the  fraction  of  server's  cr(r)  capacity  allocated  to  class  r  at  time 
t.  Under  this  fixed  routing  policy,  dr'r(t)  =  yr*T  and  ur>r(t)  = 
ur*{t)pr*v*  for  t,  yielding 


R 

G(t)  =  2j2^nr(t) 

r  =  l 


:r  r 

y  ^  Ao rfVrfr 
[rf=l 
R 

+  y  ^  Brf  pr*  (f)PrfT  /lr  Ur  (/) 

r'  =  l 


(20) 


The  FTP  policy  defined  in  the  statement  of  the  proposition 
minimizes  G(t)  over  routing/scheduling  decisions,  thus,  the 
resulting  G(i)  is  less  than  or  equal  the  one  in  (20)  for  all  f. 

We  have  now  reduced  the  problem  to  the  exact  same  sched¬ 
uling  problem  addressed  in  Prop.  V.  1 ,  namely,  an  open  MQNET 
with  fixed  routing  matrix  P  and  external  Poisson  arrival  rate 
equal  to  Aorr  Vrfr  f°r  class  r.  The  first  of  the  set  of  equa¬ 

tions  in  the  statement  of  the  proposition  is  the  traffic  equation 
while  the  last  is  the  usual  stability  condition  at  each  node.  Fol¬ 
lowing  the  steps  of  the  proof  of  Prop.  V.l  we  can  establish  the 
desired  result.  ■ 

Following  the  same  steps  as  in  the  proof  of  Theorem  V.2  we 
can  also  establish  that  Proposition  VII.  1  implies  the  stability  of 
the  stochastic  system.  The  main  result  for  the  TP  policy  in  the 
combined  routing/scheduling  model  is  summarized  next. 
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Theorem  VII. 2:  Consider  the  MQNET  of  this  section  in¬ 
volving  both  sequencing  and  routing  decisions  and  operated 
under  the  TP  policy  using  norm  ||n(^)|[^,  where  /?  >  0.  Sup¬ 
pose  there  exists  a  routing  probability  matrix  P  =  {pTr>  }^r,=1 
and  nonnegative  yrrr,  r,  r}  —  1. . . . ,  R ,  such  that 


R 


AT.  —  ^  AQr'pr'r  +  y*  ^  A rtprtTy  r  —  In,  - . .  j  R 


rr= i 
R 


r;— 1 


^  Vrr*  —  1  ?  F  —  1  j  -  -  -  j  R 


T*=  1 


l/rrf  >  0,  rfrf  =  1,  ftR 

E— <  1'  *  =  ^ 

rfe 


and  (I— Pr)  is  invertible.  Then,  the  corresponding  Markov  chain 
n(£)  is  positive  Harris  recurrent. 


VUI.  Closed  Networks 

In  this  section,  we  consider  the  case  of  closed  MQNETs 
and  introduce  a  class  of  TP  policies  for  such  systems.  We  first 
introduce  the  model  and  define  TP  policies,  then  discuss  their 
implementation,  and  finally  use  fluid  analysis  to  investigate 
their  efficiency.  The  notion  of  efficiency  of  scheduling  policies 
in  closed  networks  has  been  introduced  in  [14];  to  accommodate 
our  more  general  model  of  dosed  networks,  we  will  extend 
an  efficiency  sufficient  condition  established  there. 

To  define  the  class  of  closed  networks  of  interest,  consider 
the  open  MQNET  of  Section  II.  Here,  however,  there  are  no 
external  arrivals  (Apr  =  0, Vr)  and  the  probability  a  job  exits 
the  network  is  zero  ( pr0  =  0 ,Vr).  Routing  is  fixed  and  not 
subject  to  optimization;  at  the  end  of  this  Section  we  comment 
on  how  our  work  can  be  extended  to  address  routing  as  well. 
The  routing  probability  matrix  P  defines  a  number,  say  K,  of 
noncommunicating  classes  which  we  call  types;  a  class  of  type  k 
can  never  be  routed  to  a  class  of  some  other  type  not  equal  to  k. 
We  use  the  notation  type  (r  )  to  denote  the  type  of  class  r.  In  the 
closed  network,  we  fix  to  the  number  of  jobs  of  type  k  and 

(to  exclude  trivial  cases)  assume  >  0  for  all  k  =  1 _ ,  K. 

Let  S  — 

Let  us  again  uniformize  the  Markov  chain  n (£),  use 
the  uniform  transition  rate  v  —  Y^-\  \h >  and  denote 

by  {r^}  the  sequence  of  transition  epochs.  Denote  by 

Ar  —  lim  ^rE[l{Br(Tjt)}]  the  throughput  of  class  r, 

k — 

where,  as  before,  Br (t^)  denotes  the  event  that  node  <r(r)  is 
working  on  class  r  right  after  time  r^-,  We  are  interested  in  a 
scheduling  policy  maximizing 

R 

(21) 

r=l 

where  h  —  .  *  *  f  ha)  >  0  are  given  weights. 

Let  x(£)  —  (1  { Si (i) } ,  *  * . *  lji^ff)})  be  the  vector  of 
scheduling  decisions  at  time  t.  Recall  M  —  diag(^i . . . . ,  pr). 
We  define  TP  policies  for  closed  MQNETs  as  follows. 

Definition  2:  We  define  as  target-pursuing  (TP)  the  class  of 
scheduling  policies  for  closed  MQNETs  which  at  each  time  t 
minimize  ||Mx(£)  —  0j|  for  some  norm  |  •  ||. 


In  the  uniformized  Markov  chain  and  for  any  weighted  norm 
IMIjS.with/3  >  0,  implementing  a  TP  policy  amounts  to  solving 
the  following  optimization  problem  at  every  epoch  r/, 

(OPT3)  min  ||Mx(n)  -  % 

S>t.  y:  Xr(n r)  <  1  i  =  I, - ,  JV 

r€Ci 

o  <  x(ta:)  <  n(rfc)  (22) 

where  x(rjfc)  is  the  decision  vector.  For  the  weighted  L2  norm 
of  (I)  this  is  a  quadratic  programming  (QP)  problem  for  which 
efficient  (i.e.,  polynomial  time)  inferior-point  algorithms  exist. 
The  work  for  solving  (OPT3)  can  be  decomposed  and  dis¬ 
tributed  across  the  various  nodes  along  the  lines  of  Section  IV.  In 
the  case  of  a  work-conserving  TP  policy,  the  first  inequality  con¬ 
straint  of  (OPT3)  becomes  an  equality  at  all  nodes  with  jobs 
present.  It  should  be  noted  that  when  all  classes  are  nonempty, 
the  optimal  solution  of  (QPT3)  does  not  depend  on  time  and 
thus,  it  only  needs  to  be  solved  once.  The  resulting  policy  is 
simply  the  projection  of  (Qi  / , . . .  Or/ pa)  onto  the  feasible 
set  of  (OPT3),  which  is  a  static  (i.e.,  time-independent)  pro¬ 
cessor  sharing  policy.  However,  when  empty  classes  exist,  some 
of  the  decision  variables  are  forced  to  zero  (due  to  the  constraint 
x(Tfc)  <  n(rfc))  and  the  static  policy  is  adjusted  to  avoid  allo¬ 
cating  capacity  to  empty  classes. 

As  with  open  networks,  [1]  derives  an  LP  whose  optimal 
value  is  an  upper  bound  on  the  optimal  weighted  throughput  of 
(21).  This  bound  is  often  tight  and  the  associated  optimal  solu¬ 
tion  can  provide  one  potential  target  8. 

For  closed  MQNETs  the  discussion  of  Section  VI  applies 
and  one  can  use  a  simulation-based  method  to  optimize  over 
the  policy  parameters  8  and  /?.  Notice  that  (OPT3)  is  a  QP 
problem,  thus,  the  use  of  a  randomized  policy  is  not  necessary 
since  the  optimal  solution  is  smooth  in  the  policy  parameters. 

A.  Efficiency  of  TP  Policies  for  Closed  Networks 

We  next  follow  [14]  and  discuss  the  efficiency  of  the  TP 
policy.  To  that  end,  we  work  with  a  fluid  model. 

I)  Fluid  Model:  Consider  the  stochastic  system  and  let 
Dr(t)  denote  the  number  of  departures  from  class  r  in  [0,  t], 
and  Tr(t)  the  amount  of  time  server  cr(r)  spends  working 
on  class  r  in  (0,  £].  Let  also  z  —  n(0)  denote  the  initial  con¬ 
dition  at  time  zero,  assuming  that  z  is  in  the  support  of  S 
E{r|type(r)=jU  zT  =  Sk  >  0  for  all  k  =  l,... ,  K).  To 
obtain  the  fluid  model  we  use  the  same  fluid  scaling  as  in  open 
networks,  and  consider  sequences  of  initial  condition  vectors 
z;  =  Iz  with  l  —*■  oo.  We  use  a  bar  to  indicate  various  quantities 
of  interest  in  the  fluid  model,  in  particular,  n(t)  denotes  the 
queue  length  vector.  We  have 

“'(*)  =  ITT11*  (l*i  I*) 

I ZH 

where  superscript  /  indicates  quantities  in  a  system  initialized 
with  z i  jobs.  Using  the  exact  same  analysis  as  in  [14],  for  every 
sequence  of  initial  conditions  zj  there  exists  a  subsequence  zy. 
such  that  along  this  subsequence  and  as  — ►  oo 

-*  n(') 


u.o.c. 
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where  u)  is  a  sample  path  satisfying  the  SLLN  for  service  and 
routing  processes.  This  result  is  analogous  to  the  one  obtained 
in  [20]  for  open  networks.  For  the  limit  processes,  we  have 


Definition  3:  The  stationary  policy  tt  is  said  to  be  efficient 
under  the  cost  structure  h  if  for  every  sequence  of  initial  condi¬ 
tions  z i  —  Iz  with  z  in  the  support  of  S  and  l  oc  we  have 


Dr(t)  -  fj,rTr  (t),  r  =  1, . . . ,  R 

R 

nr(t)  =  fir(0)  +  E  p,<rDr’(t)  -  Dr(t),  r  =  1, . . . ,  R 


r'=l 

E  r-(*)  ^ 

rECl 
R 

E«r(o)  =  i  E  m°)  =  S 

{r|type(r)=A:} 

nr(t),Tr(t)>  0t  r  =  1,  ,  i? 


(23) 


along  with  some  additional  equations  provided  in  [14].  Let 
ur(t)  =  Tr(t)  denote  the  fraction  of  the  capacity  allocated  to 
class  r  by  node  cr(r),  where  the  derivative  is  defined  at  regular 
times.  We  next  derive  the  fluid  version  of  the  TP  policy  of 
Definition  2,  using  a  weighted  L2  norm  ||  -  \p  and  target  8. 

It  can  be  seen  that  in  the  fluid  limit  the  policy  selects  alloca¬ 
tions  u(t)  —  ...  7  satisfying  the  dynamics  in  (23) 

and  minimizing  ||Mu(£)  -  9\\p.  Consider  next  the  constraints 
under  hich  the  TP  policy  in  the  stochastic  system  makes  deci¬ 
sions  (cf.  (OPT3))  and  note  that  the  policy  idles  on  an  empty 
class  .  The  fluid  version  of  the  TP  policy  selects  allocations  u(f ) 
satisfying  (23)  with  the  additional  constraint  that  one  can  not  al¬ 
locate  capacity  to  empty  classes.  We  will  refer  to  this  policy  as 
the  FTP  policy. 

We  are  now  ready  to  formally  define  the  notion  of  efficiency 
of  scheduling  policies  for  our  closed  MQNET,  which  is  an 
extension  of  a  similar  definition  in  [14]  that  applies  to  closed 
networks  with  a  single  type.  Consider  the  following  LP: 


(Eff  -  LP)  max 

s.t. 


Ykk 


E7S1 

I-SC,  1  • 

A  =  P'A 


i  = 


(24) 


where  A  is  the  decision  vector.  The  first  inequality  constraint 
bounds  the  utilization  of  all  servers  by  one  and  the  second  con¬ 
straint  is  the  set  of  traffic  equations  for  the  closed  network. 
These  latter  equations  have  a  unique  solution  up  to  a  multiplica¬ 
tive  constant.  Let  A"  =  (AJ , , . . ,  A^)  be  an  optimal  solution  of 
(Eff  —  LP).  We  can  view  hT A*  as  the  maximal  weighted 
throughput  sustainable  by  the  network.  Note  that  at  least  one  of 
the  inequality  (utilization)  constraints  is  tight  at  the  optimal  so¬ 
lution.  Any  node  corresponding  to  a  tight  utilization  constraint 
at  A*  will  be  called  a  bottleneck  node . 

Let  now  A*(z)  denote  the  throughput  vector  achieved  under 
a  stationary  policy  tt  when  the  closed  network  is  initialized  with 
n(0)  —  z.  We  define  the  efficiency  of  ?r  as  follows. 


R 

lim  V*  hTX l  (z  1)  =  V  hr  X* 

l  — *■  DC  * - J  J 

1 


where  A*  is  an  optimal  solution  of  (EfF  —  LP). 

The  following  theorem  generalizes  [14,  Th.  4.2]  and  provides 
a  sufficient  condition  on  efficiency  based  on  the  fluid  limit.  We 
omit  the  proof  since  it  is  very  similar  to  the  corresponding  proof 
in  [14], 

Theorem  VIII.  1:  Consider  a  stationary  scheduling  policy  tt 
under  which  every  fluid  limit  satisfies 


llrDrjt} 

t 


R 

>  E  h’x* 

r= 1 


a.s. 


Then  7 r  is  efficient  under  the  cost  structure  h. 

Another,  and  perhaps  more  convenient,  way  to  express  this 
sufficient  condition  is  provided  by  the  following  corollary.  The 
proof  is  immediate  since  the  condition  below  implies  the  suffi¬ 
cient  condition  of  Theorem  VIII.  1. 

Corollary  VII 1.2:  Consider  a  stationary  scheduling  policy  tt 
under  which  for  every  fluid  limit  there  exists  a  time  T  <  00 
such  that  for  all  regular  times  £  >  T 


E  hMt)  >  Y  hrK- 

r— 1  r= 1 

Then,  tt  is  efficient  under  the  cost  structure  h. 

We  will  use  Corollary  VIII. 2  to  investigate  the  efficiency  of 
the  TP  policies  for  closed  networks  we  defined  earlier.  Our  main 
result  is  stated  in  the  following  theorem. 

Theorem  VIII.  3:  Consider  the  TP  policy  with  target  0  =  A* 
using  a  weighted  L2  norm  ||  -  ||^  with  /?  >  0.  This  TP  policy  is 
efficient  under  the  cost  structure  h. 

Proof:  Recall  that  the  FTP  satisfies  uT{t)  —  0  if 

nr(t)  =  0.  Without  loss  of  generality,  assume  that  ini¬ 

tially  class  r  is  empty  (if  multiple  classes  are  empty, 
the  analysis  is  the  same).  Hence,  the  FTP  policy  selects 
u(i)  =  (A?/mi,  . . i,-!  A  \*r+1/iir+1,. . . ,  A yua) 
since  this  minimizes  ||Mu(f)  -  0||p  subject  to  the  proper 
constraints.  Using  this  policy  and  after  a  small  time  interval 
8  class  r  will  seize  to  be  empty  (due  to  arrivals  from  other 
classes).  At  that  point  in  time,  the  FTP  policy  switches  to  the 
allocation  u *(t)  =  (A* / , , . . ,  A *  ■  . ,  A r/pr)  since  it 
minimizes  ||Mu(£)  —  8 1|^  subject  to  the  proper  constraints. 
Notice,  that  we  now  have  flow  balance,  i.e.,  the  departing 
flow  rate  always  equals  the  arriving  flow  rate  for  all  classes. 
Therefore,  at  any  time  t  >  8,  no  class  is  empty,  and  the  same 
allocation  u *(t)  remains  in  effect.  This  allocation  achieves  a 
throughput  of£?=1  hrfjbrur(t)  =  y)f=1  hr A*,  that  is,  the  TP 
policy  is  efficient  under  cost  structure  h.  ■ 

We  conclude  this  section  by  outlining  how  routing  decisions 
can  be  incorporated  in  our  setting.  As  in  Section  VII,  we  can 
define  variables  xrrf(t)  —  l{Z?rr*  (OK  where  £W(0  denotes 
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the  event  that  at  time  t  node  rj(r)  is  working  on  a  class  r  job 
that  will  be  routed  to  class  rf  upon  completion  of  service.  Let 
y(t)  =  {Viiit),.  -  -  where  yrr'(t)  =  prXrr'(t),  and 

9  —  (fln  7 ,  0RR ) .  We  can  then  define  TP  policies  as  the  class 
of  combined  scheduling/routing  policies  which  at  each  time  t 
minimize  || y(t)  —  fl||  for  some  norm  ||  ■  ||.  In  fact,  the  problem 
can  be  transformed  to  a  pure  scheduling  problem.  To  that  end, 
split  each  class  r  to  at  most  O(R)  classes,  one  for  every  possible 
class  t*  that  class  r  jobs  can  be  routed  to.  In  this  modified  closed 
MQNET,  there  are  only  scheduling  decisions  to  be  made  and 
routing  is  detenu  ini  Stic. 


Fig.  2.  Example  1 :  There  are  two  types  of  jobs  with  Poisson  arrival  rates  X  A 
and  Xu  (and  3  classes  indicated  on  the  figure).  All  jobs  require  exponential 
service  time  with  rate  R  ,  and  at  nodes  t  and  2,  respectively. 


IX.  Numerical  Results 

Next,  we  present  some  illustrative  numerical  results  to  assess 
the  performance  of  TP  policies.  We  implement  the  policy  under 
work-conserving  constraints.  Non-work-conserving  TP  policies 
typically  result  in  worst  performance. 

A.  Open  Netoorks 

The  first  example  we  consider  is  the  two- node  network  of 
Fig.  2.  In  Table  I  we  compare  several  work-conserving  sched¬ 
uling  policies)  with  h  —  e.  The  parameters  for  the  various  traffic 
scenarios  are  listed  in  Table  Ft,  where  we  use  p  =  (pi.pz) 
to  denote  the  utilizations  of  nodes  1,2,  respectively.  We  use 
the  following  abbreviations  for  the  various  traffic  scenarios  we 
considered:  I.L.  (imbalanced  light),  B.L.  (balanced  light),  I.M. 
(imbalanced  medium),  B.M.  (balanced  medium),  I.H.  (imbal¬ 
anced  heavy),  and  B.H.  (balanced  heavy).  The  second  column 
of  Table  I  (ALP)  lists  the  lower  bound  on  optimal  performance 
obtained  by  solving  the  achievable  region  LP  of  [3]  (see  Sec¬ 
tion  III).  The  third  column  of  Table  I  (DP)  lists  the  optimal  per¬ 
formance  obtained  via  dynamic  programming;  the  last  row  is 
missing  because  it  was  computationally  intractable  to  obtain. 
The  fourth  column  of  Table  I  [TP(w*)]  reports  the  performance 
(obtained  by  simulation)  of  the  TP  policy  using  the  L2  norm 
for  n (t)  with  target  9  equal  to  the  optimal  solution  w*  of  the 
achievable  region  LP  and  norm  weight  vector  0  ==  e.  The  fifth 
column  of  Table  I  (OTP)  reports  the  performance  of  the  TP 
policy  using  the  same  norm  but  with  optimized  (as  discussed 
in  Section  VI)  policy  parameters.  For  the  first  four  rows,  we 
only  optimized  over  $  and  used  0  =  e.  For  the  last  two  (heavy 
traffic)  rows  we  also  optimized  over  0  and  report  those  results 
in  brackets.  The  optimal  0  turned  out  to  be  (1,3. 4, 7.2)  for  I.H. 
and  (1,2.6, 1 1.2)  for  B.H.,  respectively.  In  the  sixth  column  of 
Table  I  (Thr),  we  list  the  performance  of  a  threshold  policy  pro¬ 
posed  in  [3J  based  on  heavy  traffic  analysis,  which  is  conjec¬ 
tured  to  be  asymptotically  optimal  in  heavy  traffic.  According 
to  this  policy,  priority  is  given  to  type  A  jobs  at  node  1  if  the 
number  of  jobs  at  node  2  is  beltw  some  threshold;  otherwise 
priority  is  given  to  type  B  jobs.  The  results  listed  in  column  6 
of  Table  I  are  for  the  best  such  policy  (i.e.,  optimized  over  the 
threshold).  Finally,  in  the  last  column  of  Table  I  we  report  the 
percentage  distance  of  the  best  policy  we  came  up  with  (OTP 
column  in  this  case)  with  the  best  other  policy  found.  In  partic¬ 
ular,  Gap  =  [(Best Ours)  —  (Best  Other)]  x  10(}%/(Best  Other). 
To  facilitate  the  reader,  we  use  bold  for  these  two  values. 


TABLE  1 

Results  for  Example  1  of  Fig.  2 


ALT 

DP 

TP{w*) 

OTP 

Thr. 

Gap 

I.L. 

0.63 

0-671 

0.678 

0.678 

0.679 

1.0% 

B.L. 

0.73 

0*843 

0.866 

0*856 

0.857 

1.5% 

I.M. 

1.9 

2*084 

2.119 

2.117 

2.129 

1.6% 

B.M. 

2.1 

2.829 

2.96 

2*895 

2,895 

2.3% 

LH. 

9.6 

9.97 

10.36 

10.33  [10.13] 

10,15 

1.6% 

B.H. 

9.9 

- 

18.0 

17.4  [15.5) 

15.5 

0% 

TABLE  It 

Parameters  for  the  Traffic  Scenarios  of  Table  I 


Xb 

Pi 

P2 

Pi 

P2 

I.L 

0.3 

0.3 

2 

1.5 

0.3 

0,2 

B.L. 

0.3 

0.3 

2 

1 

0.3 

0.3 

I.M. 

0.6 

0.6 

2 

1.5 

0.6 

0,4 

B.M. 

0.6 

0,6 

2 

I 

0,6 

Oh  6 

LH. 

0.9 

0.9 

2 

1.5 

0*9 

0.6 

B.H. 

0,9 

0.9 

2 

1 

0,9 

0.9 

A  couple  of  remarks  are  in  order.  First,  the  TP  policy  using 
9  =  w*  performs  well  from  light  to  moderate  traffic  scenarios. 
This  is  appealing  since  w*  can  be  computed  in  polynomial- 
time  by  solving  the  achievable  region  LP.  It  is  interesting  to  see 
that  the  optimal  solution  of  this  LP  can  lead  to  a  fairly  good 
policy.  The  optimized  TP  policy  performs  even  better  and  is 
close  to  optimal.  In  the  heavy-traffic  cases  (especially  B.H.) 
using  a  weighted  norm  improves  performance.  The  numerical 
results  suggest  that  /?3  ,  02 )  is  appropriate  for  those 

cases.  This  is  to  be  expected  since  as  0$  — >  oo  the  TP  policy 
approaches  the  threshold  policy  of  [3]  with  threshold  9$  and  the 
latter  policy  is  known  to  be  effective  in  heavy-traffic. 

The  second  example  we  consider  is  the  six-class  network  of 
Fig.  3.  The  results  are  reported  in  Table  III,  where  we  use  the 
same  notation  and  abbreviations  as  in  Table  I.  The  parameters 
for  the  various  traffic  scenarios  are  listed  in  Table  IV.  In  the  fifth 
column,  optimization  was  done  over  9  keeping  0  =  e.  In  the 
last  two  rows  of  this  column  we  also  optimized  over  0  and  report 
the  results  in  brackets.  The  sixth  column  (BPP)  lists  results  from 
the  best  strict  priority  policy  we  were  able  to  find.  Finally,  as  in 
Table  I,  the  last  column  reports  the  percentage  gap  of  our  best 
policy  with  the  best  other  policy  found. 
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TABLE  V 

Results  for  Example  3  of  Fig.  4 


ALP 

DP 

TP(wf) 

OTP 

Gap 

7.08 

8.18 

13.04 

8.46 

3.4% 

A  a_VL 


Xr  JX 


M  I 


P4 


M  2 


Fig.  3.  Example  2:  There  are  two  types  of  jobs  with  Poisson  arrival  rates  XA 
and  Xb  .  A1J  jobs  require  exponential  service  times  with  rate  ft  t  for  class  i.  = 
1 . G.  The  cost  vector  h  in  the  objective  of  (3)  is  set  equal  to  e. 


Fig.  4.  Example  3:  There  are  two  types  of  jobs  with  Poisson  arrival  rates  \A 
and  Xb  and  exponential  service  times  with  rate  p,  for  class  /.  Again,  h  =  e. 


TABLE  III 

Results  for  Example  2  of  Fig.  3 


ALP 

DP 

TP(w*) 

OTP 

BPP 

Gap 

I.L. 

C.62 

0.663 

0.684 

0.671 

0,743 

1.2% 

B.L, 

0.71 

0.798 

0.844 

0.803 

0.916 

0.3% 

I,M. 

1.76 

1.966 

2.15 

2.01 

2.31 

2% 

BJM, 

\M 

2.56 

2.81 

2.59 

3.07 

0.8% 

LET 

7.63 

- 

9.41 

8.45  [8.32] 

9.21 

-9.7% 

B.H, 

8.21 

- 

16 

13.8  [13.6] 

15.1 

-9.9% 

TABLE  IV 

Parameters  for  the  Traffic  Scenarios  of  Table  III 


A  B 

/q 

frl 

^4 

fry 

m 

I.L. 

3/140 

3/140 

1/4 

2/3 

1/8 

1/4 

1/2 

3/14 

B.L. 

3/140 

3/140 

1/4 

1 

1/8 

1/6 

1/2 

1/7 

LM. 

6/140 

6/140 

1/4 

2/3 

1/8 

1/4 

1/2 

3/14 

B.M. 

6/140 

6/140 

1/4 

1 

1/8 

1/0 

1/2 

1/7 

I.H. 

9/140 

9/140 

1/4 

2/3 

1/8 

1/4 

1/2 

3/14 

B.H. 

9/140 

9/140 

1/4 

1 

1/8 

1/6 

1/2 

1/7 

The  conclusions  in  this  more  challenging  network  are  similar. 
The  TP  policies  with  target  equal  to  w*  perform  quite  well  from 
light  to  moderate  traffic  scenarios.  In  heavy  traffic,  performance 
can  further  be  improved  by  optimizing  over  policy  parameters 
(0.  j3).  Overall,  we  are  within  2%  of  the  optimal  (when  possible 
to  compute)  or  we  outperform  by  more  than  9%  the  best  other 
policy  found. 

The  third  example  we  show  is  the  Rybko— Stolyar  network 
[25],  [26].  We  used  XA  =  A#  =  I,  fii  =  =  6,  p2  =  p4  = 

1.5.  It  has  been  shown  that  the  last- buffer-first -serve  policy  is 
unstable  with  these  parameters.  Instead,  TP  policies  perform 
pretty  well  (Table  V). 


Fig.  5,  Example  4. 


TABLE  VI 

Results  for  the  Routing  Example  (Example  4)  of  Fig.  5 


Load 

A 

P 

P 

ALT 

SQ 

OTP„ 

Gap 

Light. 

1,65 

1.5 

0,55 

L22 

1.69 

1,69 

0% 

Medium 

2,1 

1.5 

0,7 

2M 

2.94 

2.94 

0% 

Heavy 

2.7 

1.5 

0.9 

9.00 

9.56 

9.56 

0% 

Our  final  open  network  example  is  the  system  of  Fig.  5  .  Jobs 
arrive  according  to  a  Poisson  process  of  rate  A  and  are  to  be 
routed  either  at  the  top  or  bottom  node.  Service  times  are  ex¬ 
ponentially  distributed  with  rate  ft  at  both  nodes.  We  need  to 
decide  where  to  route  each  job  in  order  to  minimize  the  objec¬ 
tive  of  (3)  with  h  =  e.  Table  VI  reports  our  results  for  three 
traffic  scenarios  corresponding  to  p  =  X/(2p)  =  0.55,0.7,0.9, 
respectively.  The  sixth  column  (SQ)  lists  the  performance  of  the 
policy  that  sends  jobs  to  the  shortest  queue,  which  is  known  to 
be  optimal  [27].  The  seventh  column  lists  the  performance  of 
the  optimized,  over  B  and  with  0  =  e,  TP  policy  using  the  L2 
norm  of  n(t).  The  last  column  compares  the  two  policies.  It  is 
evident  that  the  optimized  TP  policy  achieves  optimality.  This 
is  to  be  expected  since  from  the  structure  of  (LP2)  and  (17)  it 
can  be  easily  verified  that  any  TP  policy  with  target  0  such  that 
8\  =  #2  and  0  =  e  makes  routing  decisions  identical  to  the  SQ 
policy. 

B.  Closed  Networks 

We  next  present  two  closed  network  examples  (cf.  Fig.  6). 
Results  for  these  examples  are  reported  in  Table  VII.  The 
second  column  lists  the  (fixed)  number  of  jobs  in  the  system 
for  each  type.  The  third  column  (ALP)  reports  an  upper  bound 
on  the  optimal  weighted  throughput  obtained  by  the  achievable 
region  LP  of  [1].  We  denote  by  w*  the  target  obtained  from  the 
optimal  solution  of  that  LP.  The  fourth  column  (DP)  lists  the 
optimal  weighted  throughput  obtained  by  solving  a  dynamic 
programming  problem.  The  fifth  column  (TP(w*))  lists  the 
performance  of  the  TP  policy  for  dosed  networks  using  target 
6  =  w*  and  the  L2  norm  ||  -  ||2.  In  some  instances  we  have 
optimized  over  the  target  B.  We  refer  to  the  latter  policy  as 
OTP  and  report  the  results  in  brackets.  The  sixth  column  lists 
the  performance  of  the  TP  policy  using  target  B  equal  to  the 
optimal  solution  A"  of  (EfF  —  LP)  in  (24).  Finally,  the  seventh 
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(a)  Example  5 


(b)  Example  6 


Fig.  6.  Exponential  service  times  with  rate  /a  for  class  i.  (a)  Example  5:  Total  number  of  jobs  equals  S,  h  =  e/4,/t  =  (1/3,  2/7, 1,  2).  (b)  Example  6:  Type  1 
jobs  (classes  1—4)  are  fixed  to  Su  type 2  jobs  (classes  5-6)  are  fixed  to  S 2,  h  —  (1/4, 1/4, 1/4, 1/4, 1/2, 1/2), jt  —  (8,  5.  2,  7,  4, 1). 


TABLE  Vn 

Results  for  Examples  5  and  6  of  Fig.  6 


s 

ALP 

DP 

TP(w*}  [OTP] 

TP(A*) 

Gap 

Ex.  5 

10 

10.53 

10.49 

10.28  [10.49] 

10.27 

0% 

Ex.  5 

100 

11.05 

11.05 

11.05 

11.05 

0% 

Ex.  6 

5/5 

1.014 

1*904 

1.882 

1.7% 

1.16% 

Ex.  6 

5/30 

1.914 

1.914 

1.895 

1.877 

0.99% 

Ex.  6 

30/5 

1.914 

1.914 

1.899 

1,807 

0.8% 

column  reports  the  distance  of  our  best  policy  from  the  optimal, 
namely,  Gap  =  [(Optimal)  —  (Best  Ours)]  x  100% /(Optimal). 

In  both  examples,  we  conclude  that  the  TP  policy  with  target 
w*  is  rather  close  to  optimal.  As  in  open  networks,  this  suggests 
that  the  optimal  solution  to  the  achievable  region  LP  contains 
useful  information  from  which  a  “good”  policy  can  be  obtained. 
The  TP  policy  with  target  equal  to  A*  performs  equally  well. 
It  becomes  near-optimal  for  large  populations,  which  is  to  be 
expected  in  the  light  of  the  efficiency  results  of  Section  VIII-A. 
Finally,  the  TP  policy  with  optimized  target  0  is  within  less  than 
1-2%  from  the  optimal  in  all  cases  considered. 

X.  Conclusion 

We  proposed  a  new  class  of  what  we  call  TP  policies  for 
scheduling  and  routing  in  MQNETs.  These  networks  can  model 
a  variety  of  systems,  including  sensor  networks,  multiprocessor 
computer  systems,  and  manufacturing  systems.  In  open  net¬ 
works  external  arrivals  were  assumed  to  be  Poisson  with  class- 
dependent  rates,  and  in  both  open  and  closed  networks  service 
times  were  assumed  to  be  exponentially  distributed  with  class- 
dependent  rates.  These  assumptions,  although  restrictive,  can 
even  accommodate  heavy-tailed  service  distributions  by  using 
a  hyperexponential  approximation  of  these  distributions.  The 
fluid  version  of  TP  policies  belongs  to  a  broader  class  of  fluid 
policies  called  greedy  in  [9].  In  general,  greedy  or  myopic  poli¬ 
cies  may  perform  extremely  poorly,  nonetheless,  we  were  able 
to  demonstrate  that  our  proposed  class  is  rather  effective. 

In  open  networks,  TP  policies  “steer”  the  state  of  the  system 
toward  a  fixed  target  0 ,  where  distance  ls  measured  using  a 
weighted  norm  with  weight  vector  ft.  We  demonstrated  that  TP 


policies  are  stable  for  any  0  under  an  L2  norm  with  weight 
vector  ft  >  0.  Hence,  they  are  safe  to  implement  even  if  the 
parameter  vector  (6.  ft)  is  not  optimally  selected  (as  long  as 
ft  >  0).  In  closed  networks,  TP  policies  “steer”  the  instanta¬ 
neous  throughput  of  the  various  classes  toward  a  fixed  target 
0,  where,  again,  distance  is  measured  using  a  weighted  norm 
with  weight  vector  ft.  We  showed  that  appropriate  target  selec¬ 
tion  leads  to  the  efficiency  of  the  corresponding  policy,  meaning, 
that  the  policy  achieves  maximum  bottleneck  throughput  in  the 
infinite  population  limit.  In  both  open  and  closed  networks,  the 
proposed  policies  are  amenable  to  distributed  implementation 
using  local  state  information. 

In  open  networks,  our  numerical  results  suggest  that  the 
polyhedral  relaxations  of  achievable  performance  of  [1]  con¬ 
tain  enough  information  to  yield  good  targets  0,  especially  in 
light  to  moderate  load  conditions.  This  might  be  sufficient  in 
many  practical  situations  involving  sensor  networks,  where 
performance  considerations  would  lead  capacity  planners  to 
avoid  heavy  loads.  In  closed  networks,  an  optimal  solution 
to  the  achievable  region  LP  of  [1]  leads  to  effective  policies, 
especially  for  large  populations.  This  is  useful  in  applications  to 
processing  dusters  in  sensor  networks  which  block  jobs  above 
a  certain  threshold  to  avoid  performance  degradation;  during 
heavy  traffic  conditions  the  cluster  can  be  modeled  as  a  closed 
network  and  the  population  will  typically  be  large. 

The  performance  of  the  proposed  class  of  TP  policies  can 
be  further  improved  by  optimizing  over  the  parameter  vector 
(6.  ft) ;  we  outlined  how  this  can  be  done  using  simulation-based 
methods.  Overall,  as  our  numerical  results  indicate,  we  obtain 
near-optimal  policies  (when  the  optimal  can  be  computed)  and 
significantly  outperform  heuristic  alternatives. 

We  dose  by  noting  that  although  we  derived  our  results  for 
networks  where  nodes  can  preempt  a  job  to  accept  another,  TP 
policies  can  also  be  implemented  in  a  nonpreemptive  setting 
with  arbitrarily  distributed  service  times.  To  that  end,  nodes  can 
make  decisions  only  at  service  completions  by  minimizing  an 
expectation  along  the  lines  of  Section  IV,  conditioning  though 
not  only  on  the  current  number  of  jobs  but  also  on  the  times 
elapsed  since  the  most  recent  service  completions  of  other 
nodes. 
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